Skip to content

B&B with Local Heaps#1149

Merged
rapids-bot[bot] merged 117 commits into
NVIDIA:release/26.06from
nguidotti:bnb-local-heap
May 29, 2026
Merged

B&B with Local Heaps#1149
rapids-bot[bot] merged 117 commits into
NVIDIA:release/26.06from
nguidotti:bnb-local-heap

Conversation

@nguidotti
Copy link
Copy Markdown
Contributor

@nguidotti nguidotti commented Apr 27, 2026

In this PR, each best-first worker has its own local node heap, such that it push/pop nodes without synchronizing with other workers. Each best-first worker periodically steals a node from a random worker to keep the node distribution more or less balance across them. Additionally, each best-first worker has a (fixed) set of diving worker assigned to it, which are used for performing diving on its own nodes whenever possible. This essentially eliminates the need of the scheduler thread, freeing one additional thread to do something useful.

This also implements a compression scheme for vstatus using only 2bits per entry, which reduces the memory consumption by roughly 4x (previously was using int8_t per entry). Last, but not least, this PR replaces std::deque with a fixed-capacity circular_deque_t for the plunge/dive stacks and the idle-worker list.

MIPLIB results (GH200, 10min):

================================================================================
main (1, #1099) vs bnb-local-heap (2)
================================================================================

------------------------------------------------------------------------------------------------------------------------------
|                                        |       Run 1        |       Run 2        |     Abs. Diff.     |   Rel. Diff. (%)   |
------------------------------------------------------------------------------------------------------------------------------
| Feasible                                                 227                  228                   +1                 --- |
| Optimal                                                   75                   78                   +3                 --- |
| Solutions with <0.1% primal gap                          124                  130                   +6                 --- |
| Nodes explored (mean)                              4.866e+06            1.436e+07           +9.496e+06                +195 |
| Nodes explored (shifted geomean)                        6772            1.205e+04                +5275               +77.9 |
| Relative MIP gap (mean)                               0.3264               0.3415             +0.01506               +4.62 |
| Relative MIP gap (shifted geomean)                    0.1156               0.1131              -0.0025               -2.16 |
| Solve time (mean)                                      444.6                441.5               -3.054              -0.687 |
| Solve time (shifted geomean)                           221.5                219.1               -2.327               -1.05 |
| Primal gap (mean)                                      11.57                11.15              -0.4201               -3.63 |
| Primal gap (shifted geomean)                          0.6324               0.5604             -0.07203               -11.4 |
| Primal integral (mean)                                 32.63                33.02              +0.3805               +1.17 |
| Primal integral (shifted geomean)                      6.346                6.405             +0.05989              +0.944 |
------------------------------------------------------------------------------------------------------------------------------

In summary, we explored ~3x nodes in average` at the same time frame. The number of optimal solutions also increased by 3.

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

bdice and others added 30 commits April 3, 2026 13:51
Remove dependency on rmm::mr::device_memory_resource base class. Resources
now satisfy the cuda::mr::resource concept directly.

- Replace shared_ptr<device_memory_resource> with value types and
  cuda::mr::any_resource<cuda::mr::device_accessible> for type-erased storage
- Replace set_current_device_resource(ptr) with set_current_device_resource_ref
- Replace set_per_device_resource(id, ptr) with set_per_device_resource_ref
- Remove make_owning_wrapper usage
- Remove dynamic_cast on memory resources (no common base class)
- Remove owning_wrapper.hpp and device_memory_resource.hpp includes
- Add missing thrust/iterator/transform_output_iterator.h include
  (no longer transitively included via CCCL)
…nd deterministic mode.

Signed-off-by: Nicolas Guidotti <224634272+nguidotti@users.noreply.github.com>
Signed-off-by: Nicolas Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas Guidotti <nguidotti@nvidia.com>
… shared_ptr to avoid unnecessary copy.

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
…l crash in work-stealing

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
…queue for now. refactoring.

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
… are present

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
# Conflicts:
#	cpp/src/utilities/cuda_helpers.cuh
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
# Conflicts:
#	ci/validate_wheel.sh
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
nguidotti added 4 commits May 28, 2026 15:08
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test 37e757a

@rg20 rg20 removed request for a team and tmckayus May 28, 2026 15:13
Comment thread cpp/src/branch_and_bound/branch_and_bound.cpp Outdated
Comment thread cpp/src/branch_and_bound/branch_and_bound.cpp
Comment thread cpp/src/branch_and_bound/node_queue.hpp Outdated
Comment thread cpp/src/branch_and_bound/node_queue.hpp
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test b2e5f8c

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Copy link
Copy Markdown
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice discussion @nguidotti and congratulations on the nice performance improvement with this PR. I'm removing the Request changes so that you can merge when ready.

A couple suggestions:

  1. It sounds like the only reason node_queue_t exposes lock and unlock is to allow a worker to steal a node from another. It would probably be better to add a method like node_queue_t::steal_from_victim(node_queue_t& victim) and handle the locking and unlocking of both queues directly in this method. That would allow you to not expose node_queue_t::lock/unlock and make it so that people touching the branch and bound code did not need to be concerned about correctly locking and unlocking the node queue. Inside steal_from_victim you can avoid deadlocks by acquiring the lock on the thief and the victim in sorted order according to their worker id (so this might need to be node_queue_t::steal_from_victim(i_t thief_id, i_t victim_id, node_queue_t& victim))

Ideally, stealing a node is an atomic operation, so that the node is always either in one queue or another, and thus the node's lower bound is always considered. If you are able to make it an atomic operation you can avoid the need to track the lower bound associated with the node separately (which may be prone to bugs).

Also, if diving needs to copy a node from the node queue, and that cannot happen while stealing, you can add a method node_queue_t::copy_node that acquires mutex_ internally.

Maybe you are able to make the above changes before merging.

  1. Longer term, I think it's worth defining the correct abstractions and data structures to make managing the lower bound simpler. A heap is already the ideal data structure for managing the lower bound, since it inherently takes the lower bound over the nodes it contains. I think we've introduced a lot of book-keeping and other data structures to manage nodes and lower bounds outside the heap. This is likely because we don't have a way to walk nodes in the heap (i.e. the standard C++ data structures only support pushing and popping). If we had a heap where we could walk nodes, I think it would simplify many of operations within the branch and bound code. Instead of popping a node off the heap, and tracking the lower bound, when solving, we could leave it on the heap and just mark that node as "solve in progress". We would only pop a node from the heap when the solve was completed. When trying to steal a node from a heap or dive from a node, thieves could avoid "solve in progress" nodes. Also, during a plunge we could push child nodes that we are not exploring directly onto the heap, instead of keeping them in a separate stack or circular buffer data structure.

If the code maintained the invariant that all open nodes are in a heap, I think it would be much easier to reason about the correctness of branch and bound.

@nguidotti nguidotti added the do not merge Do not merge if this flag is set label May 28, 2026
nguidotti added 2 commits May 29, 2026 13:23
…logic for launching new bfs workers and work stealing

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
…ressing the packed buffer

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test d094751

nguidotti added 2 commits May 29, 2026 15:05
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test 6f7ab06

@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test d18c1e9

@nguidotti nguidotti removed the do not merge Do not merge if this flag is set label May 29, 2026
@nguidotti
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit a339f1c into NVIDIA:release/26.06 May 29, 2026
99 checks passed
@nguidotti nguidotti deleted the bnb-local-heap branch May 29, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality mip non-breaking Introduces a non-breaking change P0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants